11 research outputs found

    Distributed Mega-Datasets: The Need for Novel Computing Primitives

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.With the ongoing digitalization, an increasing number of sensors is becoming part of our digital infrastructure. These sensors produce highly, even globally, distributed data streams. The aggregate data rate of these streams far exceeds local storage and computing capabilities. Yet, for radical new services (e.g., predictive maintenance and autonomous driving), which depend on various control loops, this data needs to be analyzed in a timely fashion. In this position paper, we outline a system architecture that can effectively handle distributed mega-datasets using data aggregation. Hereby, we point out two research challenges: The need for (1) novel computing primitives that allow us to aggregate data at scale across multiple hierarchies (i.e., time and location) while answering a multitude of a priori unknown queries, and (2) transfer optimizations that enable rapid local and global decision making.EC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNe

    Online Replication Strategies for Distributed Data Stores

    Get PDF
    The rate at which data is produced at the network edge, e.g., collected from sensors and Internet of Things (IoT) devices, will soon exceed the storage and processing capabilities of a single system and the capacity of the network. Thus, data will need to be collected and preprocessed in distributed data stores - as part of a distributed database - at the network edge. Yet, even in this setup, the transfer of query results will incur prohibitive costs. To further reduce the data transfers, patterns in the workloads must be exploited. Particularly in IoT scenarios, we expect data access to be highly skewed. Most data will be store-only, while a fraction will be popular. Here, the replication of popular, raw data, as opposed to the shipment of partially redundant query results, can reduce the volume of data transfers over the network. In this paper, we design online strategies to decide between replicating data from data stores or forwarding the queries and retrieving their results. Our insight is that by profiling access patterns of the data we can lower the data transfer cost and the corresponding response times. We evaluate the benefit of our strategies using two real-world datasets

    Edge Replication Strategies for Wide-Area Distributed Processing

    Get PDF
    The rapid digitalization across industries comes with many challenges. One key problem is how the ever-growing and volatile data generated at distributed locations can be efficiently processed to inform decision making and improve products. Unfortunately, wide-area network capacity cannot cope with the growth of the data at the network edges. Thus, it is imperative to decide which data should be processed in-situ at the edge and which should be transferred and analyzed in data centers. In this paper, we study two families of proactive online data replication strategies, namely ski-rental and machine learning algorithms, to decide which data is processed at the edge, close to where it is generated, and which is transferred to a data center. Our analysis using real query traces from a Global 2000 company shows that such online replication strategies can significantly reduce data transfer volume in many cases up to 50% compared to naive approaches and achieve close to optimal performance. After analyzing their shortcomings for ease of use and performance, we propose a hybrid strategy that combines the advantages of both competitive and machine learning algorithms.EC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNetBMBF, 01IS18025A, Verbundprojekt BIFOLD-BBDC: Berlin Institute for the Foundations of Learning and DataBMBF, 01IS18037A, Verbundprojekt BIFOLD-BZML: Berlin Institute for the Foundations of Learning and Dat

    Dual SGLT-1 and SGLT-2 inhibition improves left atrial dysfunction in HFpEF

    Get PDF
    Background: Sodium-glucose linked transporter type 2 (SGLT-2) inhibition has been shown to reduce cardiovascular mortality in heart failure independently of glycemic control and prevents the onset of atrial arrhythmias, a common co-morbidity in heart failure with preserved ejection fraction (HFpEF). The mechanism behind these effects is not fully understood, and it remains unclear if they could be further enhanced by additional SGLT-1 inhibition. We investigated the effects of chronic treatment with the dual SGLT-1&2 inhibitor sotagliflozin on left atrial (LA) remodeling and cellular arrhythmogenesis (i.e. atrial cardiomyopathy) in a metabolic syndrome-related rat model of HFpEF. Methods: 17 week-old ZSF-1 obese rats, a metabolic syndrome-related model of HFpEF, and wild type rats (Wistar Kyoto), were fed 30 mg/kg/d sotagliflozin for 6 weeks. At 23 weeks, LA were imaged in-vivo by echocardiography. In-vitro, Ca2+ transients (CaT; electrically stimulated, caffeine-induced) and spontaneous Ca2+ release were recorded by ratiometric microscopy using Ca2+-sensitive fluorescent dyes (Fura-2) during various experimental protocols. Mitochondrial structure (dye: Mitotracker), Ca2+ buffer capacity (dye: Rhod-2), mitochondrial depolarization (dye: TMRE) and production of reactive oxygen species (dye: H2DCF) were visualized by confocal microscopy. Statistical analysis was performed with 2-way analysis of variance followed by post-hoc Bonferroni and student's t-test, as applicable. Results: Sotagliflozin ameliorated LA enlargement in HFpEF in-vivo. In-vitro, LA cardiomyocytes in HFpEF showed an increased incidence and amplitude of arrhythmic spontaneous Ca2+ release events (SCaEs). Sotagliflozin significantly reduced the magnitude of SCaEs, while their frequency was unaffected. Sotagliflozin lowered diastolic [Ca2+] of CaT at baseline and in response to glucose influx, possibly related to a similar to 50% increase of sodium sodium-calcium exchanger (NCX) forward-mode activity. Sotagliflozin prevented mitochondrial swelling and enhanced mitochondrial Ca2+ buffer capacity in HFpEF. Sotagliflozin improved mitochondrial fission and reactive oxygen species (ROS) production during glucose starvation and averted Ca2+ accumulation upon glycolytic inhibition. Conclusion: The SGLT-1&2 inhibitor sotagliflozin ameliorated LA remodeling in metabolic HFpEF. It also improved distinct features of Ca2+-mediated cellular arrhythmogenesis in-vitro (i.e. magnitude of SCaEs, mitochondrial Ca2+ buffer capacity, diastolic Ca2+ accumulation, NCX activity). The safety and efficacy of combined SGLT-1&2 inhibition for the treatment and/or prevention of atrial cardiomyopathy associated arrhythmias should be further evaluated in clinical trials

    Daten-getriebene Übertragungsoptimierung für Big Data im Industriellen Internet der Dinge

    No full text
    In the last two decades, the Internet of Things (IoT) has grown from a mere vision to everyday reality. Its fundamental idea is that devices become interconnected with each other and digital services. The consumer side of the IoT, the Consumer Internet of Things (CIoTs), has become omnipresent in the form of wearables, virtual assistants, and smart home solutions. The industrial side of the IoT, the Industrial Internet of Things (IIoT), has received less attention from the general public. The IIoT takes the shape of industrial-grade devices, from trucks to industrial robots, that are equipped with sensors and networking chipsets. It promises to reduce waste, increase machine lifespans, improve energy efficiency, and enable mass customization.  The CIoT predominantly creates big data sparsely across wide areas, e.g., distributed over many households. CIoT applications collect and process this data in the cloud. In contrast, the IIoT predominantly creates big data at industrial facilities that are densely populated with devices. Because these industrial facilities are often connected to the cloud by low-bandwidth access networks, IIoT big data cannot be entirely transferred to the cloud. Simultaneously, industrial facilities are often equipped with limited computing resources. This creates a data-compute asymmetry where most data stays at resource-constrained industrial facilities, and only a fraction is transferred to the resource rich cloud. Unmitigated, the network bottleneck delays the installation of IIoT applications. This thesis introduces software solutions that reduce the impact of the network bottleneck.  Systems processing IIoT big data face complexity from both the data sources and application requirements. On the one side, the data is generated by inherently hierarchical and distributed industrial processes and retains these qualities. On the other side, IIoT applications have diverse requirements on data access and processing (e.g., requiring database-like access to historic IIoT big data or processing recent IIoT big data as data streams). This work proposes a high-level architecture that connects both sides using novel computing primitives. Our novel computing primitives flexibly aggregate and combine data across hierarchies and locations. As part of our architecture, we introduce data-driven transfer optimizations to reduce the impact of the network bottleneck. The remainder of the thesis presents three case studies that implement data-driven transfer optimizations for different data processing frameworks.  In our first case study, IIoT applications in the cloud access a data store at an industrial facility. They face a trade-off between processing individual queries at the industrial facility and transferring raw data to the cloud. We introduce online replication strategies that make fine-granular choices based on data access patterns. In our second case study, an IIoT application identifies the top-k most relevant objects (e.g., machine failures) across multiple industrial facilities. We introduce a new fixed-phase distributed top-k algorithm. This algorithm uses fewer phases than related work while simultaneously reducing the data transfer volume compared to the state-of-the-art. In our final case study, IIoT applications process data streams using dataflow programs. Dataflow programs process data by moving it through an operator graph. A sudden rise in the data input rate or a software or hardware failure risks to increase the dataflow program’s latency and decrease its throughput. We introduce a load shedding solution that mitigates this risk and simultaneously balances the data loss with the loss of previously done work. Our work enables IIoT applications for resource and bandwidth-constrained industrial facilities.In den letzten zwei Jahrzehnten hat sich das Internet of Things (IoT) von einer bloßen Vision zur alltäglichen Realität entwickelt. Die Grundidee des IoT ist, dass sich Geräte untereinander und mit digitalen Diensten vernetzen. Die Konsumentenseite des IoT, auch bekannt als Consumer Internet of Things (CIoT), ist in der Form von Wearables, virtuellen Assistenten und Smart-Home-Lösungen allgegenwärtig. Die industrielle Seite des IoT, auch bekannt als Industrial Internet of Things (IIoT), hat in der breiten Öffentlichkeit weniger Beachtung gefunden. Inzwischen werden Industriegeräten, von Lastwagen bis hin zu Industrierobotern, mit Sensoren und Netzwerk-Chipsätzen aus gestattet. Die Vernetzung dieser Geräte innerhalb des IIoT verspricht Abfälle zu reduzieren, die Lebensdauer und Energieeffizienz von Maschinen zu erhöhen und Produktionsflexibilität zu ermöglichen.  Das CIoT erzeugt Big Data vorwiegend spärlich über weite Bereiche, z.B. über viele Haushalte verteilt. Diese Daten werden zumeist in der Cloud gesammelt und dort verarbeitet. Im Gegensatz dazu erzeugt das IIoT Big Data überwiegend in Industrieanlagen mithilfe einer großen Zahl von vernetzten Geräten. Da diese Industrieanlagen oft über Zugangsnetze mit geringer Bandbreite mit der Cloud verbunden sind, kann IIoT-Big-Data nicht vollständig in die Cloud übertragen werden. Gleichzeitig sind die Industrieanlagen häufig mit begrenzten Rechenressourcen ausgestattet. Dadurch entsteht eine Daten-Verarbeitungskapazität-Asymmetrie, bei der die meisten Daten in den ressourcenbeschränkten Industrieanlagen verbleiben und nur ein Bruchteil in die ressourcenreiche Cloud übertragen wird. Ungemildert verzögert der Netzwerkengpass die Installation von IIoT-Anwendungen. Diese Arbeit stellt Softwarelösungen vor, die die Auswirkungen des Netzwerkengpasses reduzieren.  Systeme, die IIoT-Big-Data verarbeiten, sind an zwei Fronten mit Komplexität konfrontiert. Auf der einen Seite werden die Daten durch inhärent hierarchische und verteilte industrielle Prozesse erzeugt und behalten diese Eigen schaften bei. Auf der anderen Seite haben IIoT-Anwendungen unterschiedliche Anforderungen an den Datenzugriff und die Datenverarbeitung (z.B. die Behandlung von IIoT-Big-Data als Datenbank oder als Datenstrom). In dieser Arbeit wird eine High-Level-Architektur vorgeschlagen, die beide Seiten mithilfe neuartigen Computing Primitives verbindet. Diese neuartigen Computing Primitives aggregieren und kombinieren Daten flexibel über Hierarchien und Standorte hinweg. Basierend auf dieser Architektur führen wir datengesteuerte Übertragungsoptimierungen ein, um die Anzahl und das Volumen des Datenaustauschs zwischen Industrieanlagen und der Cloud zu begrenzen. Im weiteren Verlauf der Arbeit stellen wir drei Fallstudien vor, die datengesteuerte Übertragungsoptimierungen für verschiedene Datenverarbeitungs-Frameworks implementieren.  In unserer ersten Fallstudie greifen IIoT-Anwendungen in der Cloud auf einen Datenspeicher in einer Industrieanlage zu. Sie stehen vor der Wahl entweder individuelle Abfragen in der Industrieanlage zu verarbeiten oder die Rohdaten in die Cloud zu übertragen um sie dort zu verarbeiten. Wir stellen Online-Replikationsstrategien vor, die auf der Grundlage von Datenzugriffsmustern fein-granulare Entscheidungen treffen. In unserer zweiten Fallstudie identifiziert eine IIoT-Anwendung die top-k-relevantesten Objekte (z.B. Maschinenausfälle) über mehrere Industrieanlagen hinweg. Wir stellen einen neuen verteilten Top-k-Algorithmus mit einer festen Anzahl von Phasen vor. Dieser Algorithmus reduziert die Anzahl der Phasen und das Datenübertragungsvolumen im Vergleich zum gegenwärtigen Stand der Technik. In unserer letzten Fallstudie verarbeiten IIoT-Anwendungen Datenströme mithilfe von Datenflussprogrammen. Diese Programme Daten, indem sie sie durch einen Graphen von verbundenen Opera toren bewegen. Ein plötzlicher Anstieg der Dateneingangsrate oder ein Software- oder Hardwarefehler kann die Latenzzeit der Programme erhöhen und ihren Durchsatz verringern. Wir stellen eine Load Shedding-Lösung vor, die dieses Risiko abmildert. Zusätzlich balanziert unsere Lösung gleichzeitig den Verlust von Daten mit dem Verlust von zuvor geleisteter Arbeit. Diese Dissertation ermöglicht datenhungrige IIoT-Anwendungen für ressourcen- und bandbreitenbeschränkte Industrieanlagen.EC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNetBMBF, 01|S12056, Software Campus (Die DNA des IoT: Distribute and Aggregate

    PaDIS emulator: an emulator to evaluate CDN-ISP collaboration

    No full text
    We present PaDIS Emulator, a fully automated platform to evaluate CDN-ISP collaboration for better content delivery, traffic engineering, and cost reduction. The PaDIS Emulator enables researchers as well as CDN and ISP operators to evaluate the benefits of collaboration using their own operational networks, configuration, and cost functions. The PaDIS Emulator consists of three components: the network emulation, the collaboration mechanism, and the performance monitor. These layers provide scalable emulation of the interaction between an ISP or a number of ISPs with multiple CDNs and vice versa. PaDIS Emulator design is flexible in order to implement a wide range of collaboration mechanisms on virtualized or real hardware, and evaluate them before introduction to operational networks.Ingmar Poese, Benjamin Frank; Simon Knight, Niklas Semmler and Georgios Smaragdaki

    PaDIS emulator

    No full text
    corecore